Rotterdam
fastml: Guarded Resampling Workflows for Safer Automated Machine Learning in R
Korkmaz, Selcuk, Goksuluk, Dincer, Karaismailoglu, Eda
Preprocessing leakage arises when scaling, imputation, or other data-dependent transformations are estimated before resampling, inflating apparent performance while remaining hard to detect. We present fastml, an R package that provides a single-call interface for leakage-aware machine learning through guarded resampling, where preprocessing is re-estimated inside each resample and applied to the corresponding assessment data. The package supports grouped and time-ordered resampling, blocks high-risk configurations, audits recipes for external dependencies, and includes sandboxed execution and integrated model explanation. We evaluate fastml with a Monte Carlo simulation contrasting global and fold-local normalization, a usability comparison with tidymodels under matched specifications, and survival benchmarks across datasets of different sizes. The simulation demonstrates that global preprocessing substantially inflates apparent performance relative to guarded resampling. fastml matched held-out performance obtained with tidymodels while reducing workflow orchestration, and it supported consistent benchmarking of multiple survival model classes through a unified interface.
- Europe > Netherlands > South Holland > Rotterdam (0.04)
- North America > United States > Wisconsin (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (2 more...)
A Causal Framework for Evaluating ICU Discharge Strategies
Simha, Sagar Nagaraj, Ortholand, Juliette, Dongelmans, Dave, Workum, Jessica D., Thijssens, Olivier W. M., Abu-Hanna, Ameen, Cinà, Giovanni
In this applied paper, we address the difficult open problem of when to discharge patients from the Intensive Care Unit. This can be conceived as an optimal stopping scenario with three added challenges: 1) the evaluation of a stopping strategy from observational data is itself a complex causal inference problem, 2) the composite objective is to minimize the length of intervention and maximize the outcome, but the two cannot be collapsed to a single dimension, and 3) the recording of variables stops when the intervention is discontinued. Our contributions are two-fold. First, we generalize the implementation of the g-formula Python package, providing a framework to evaluate stopping strategies for problems with the aforementioned structure, including positivity and coverage checks. Second, with a fully open-source pipeline, we apply this approach to MIMIC-IV, a public ICU dataset, demonstrating the potential for strategies that improve upon current care.
- Europe > Netherlands > South Holland > Rotterdam (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
Double Machine Learning for Static Panel Data with Instrumental Variables: New Method and Applications
Baiardi, Anna, Clarke, Paul S., Naghi, Andrea A., Polselli, Annalivia
Panel data methods are widely used in empirical analysis to address unobserved heterogeneity, but causal inference remains challenging when treatments are endogenous and confounding variables high-dimensional and potentially nonlinear. Standard instrumental variables (IV) estimators, such as two-stage least squares (2SLS), become unreliable when instrument validity requires flexibly conditioning on many covariates with potentially non-linear effects. This paper develops a Double Machine Learning estimator for static panel models with endogenous treatments (panel IV DML), and introduces weak-identification diagnostics for it. We revisit three influential migration studies that use shift-share instruments. In these settings, instrument validity depends on a rich covariate adjustment. In one application, panel IV DML strengthens the predictive power of the instrument and broadly confirms 2SLS results. In the other cases, flexible adjustment makes the instruments weak, leading to substantially more cautious causal inference than conventional 2SLS. Monte Carlo evidence supports these findings, showing that panel IV DML improves estimation accuracy under strong instruments and delivers more reliable inference under weak identification.
- Oceania > Australia (0.04)
- North America > United States (0.04)
- South America > Argentina (0.04)
- (4 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- South America > Brazil (0.04)
- Europe > Netherlands > South Holland > Rotterdam (0.04)
- Asia > Japan (0.04)
- Media > Film (1.00)
- Leisure & Entertainment (1.00)
- Health & Medicine > Therapeutic Area > Oncology (0.68)
- North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.04)
- Europe > Netherlands > South Holland > Rotterdam (0.04)
- Europe > Finland > Uusimaa > Helsinki (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
- Asia > India > Karnataka > Bengaluru (0.40)
- Oceania > Australia > Victoria > Melbourne (0.04)
- North America > United States > District of Columbia > Washington (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.96)
- Information Technology > Data Science > Data Mining (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
- Asia > China > Fujian Province > Xiamen (0.04)
- Europe > Netherlands > South Holland > Rotterdam (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- North America > United States > California > Santa Barbara County > Santa Barbara (0.14)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- (2 more...)
- Europe > Netherlands > North Holland > Amsterdam (0.12)
- Europe > Netherlands > South Holland > Rotterdam (0.05)
- Europe > Netherlands > Gelderland > Nijmegen (0.05)
- (10 more...)
- North America > United States > Texas > Harris County > Houston (0.14)
- Asia > China (0.05)
- North America > United States > Ohio > Cuyahoga County > Cleveland (0.04)
- (4 more...)